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Abstract 


We present an Angluin-style algorithm to learn nominal automata, 
which are acceptors of languages over infinite (structured) alphabets. 
The abstract approach we take allows us to seamlessly extend 
known variations of the algorithm to this new setting. In particular 
we can learn a subclass of nominal non-deterministic automata. 
An implementation using a recently developed Haskell library for 
nominal computation is provided for preliminary experiments. 


Categories and Subject Descriptors D.1.1 [Software]: Program- 
ming Techniques; F.4.3 [Mathematical Logic and Formal Lan- 
guages]: Formal Languages; 1.3.2 [Artificial Intelligence]: Learn- 
ing 


Keywords Active Learning, (Non)Deterministic Finite Automata, 
Nominal Automata, Functional Programming 


1. Introduction 


Automata are a well established computational abstraction with a 
wide range of applications, including modelling and verification of 
(security) protocols, hardware, and software systems. In an ideal 
world, a model would be available before a system or protocol 
is deployed in order to provide ample opportunity for checking 
important properties that must hold and only then the actual system 
would be synthesized from the verified model. Unfortunately, this 
is not at all the reality: Systems and protocols are developed and 
coded in short spans of time and if mistakes occur they are most 
likely found after deployment. In this context, it has become popular 
to infer or learn a model from a given system just by observing its 
behaviour or response to certain queries. The learned model can 
then be used to ensure the system is complying to desired properties 
or to detect bugs and design possible fixes. 

Automata learning, or regular inference [3], is a widely used 
technique for creating an automaton model from observations. The 
original algorithm [3], by Dana Angluin, works for deterministic 
finite automata, but since then has been extended to other types of 
automata [1] {4][35}, including Mealy machines and I/O automata, 
and even a special class of context-free grammars. Angluin’s algo- 
rithm is sometimes referred to as active learning, because it is based 
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on direct interaction of the learner with an oracle (“the Teacher”) 
that can answer different types of queries. This is in contrast with 
passive learning, where a fixed set of positive and negative examples 
is given and no interaction with the system is possible. 

In this paper, staying in the realm of active learning, we will 
extend Angluin’s algorithm to a richer class of automata. We are 
motivated by situations in which a program model, besides control 
flow, needs to represent basic data flow, where data items are 
compared for equality (or for other theories such as total ordering). 
In these situations, values for individual symbols are typically drawn 
from an infinite domain and automata over infinite alphabets become 
natural models, as witnessed by a recent trend [2)|9]{12]{15|[17]). 

One of the foundational approaches to formal language theory 
for infinite alphabets uses the notion of nominal sets [9]. The theory 
of nominal sets originates from the work of Fraenkel in 1922, and 
they were originally used to prove the independence of the axiom of 
choice and other axioms. They have been rediscovered in Computer 
Science by Gabbay and Pitts [B6], as an elegant formalism for 
modeling name binding, and since then they form the basis of many 
research projects in the semantics and concurrency community. In 
a nutshell, nominal sets are infinite sets equipped with symmetries 
which make them finitely representable and tractable for algorithms. 
We make crucial use of this feature in the development of a learning 
algorithm. 

Our main contributions are the following. 

e A generalization of Angluin’s original algorithm to nominal 
automata. The generalization follows a generic pattern for 
transporting computation models from finite sets to nominal 
sets, which leads to simple correctness proofs and opens the 
door to further generalizations. The use of nominal sets with 
different symmetries also creates potential for generalization, 
e.g. to languages with time features or data dependencies 
represented as graphs . 


An extension of the algorithm to nominal non-deterministic 
automata (nominal NFAs). To the best of our knowledge, this is 
the first learning algorithm for non-deterministic automata over 
infinite alphabets. It is important to note that, in the nominal 
setting, NFAs are strictly more expressive than DFAs. We 
learn a subclass of the languages accepted by nominal NFAs, 
which includes all the languages accepted by nominal DFAs. 
The main advantage of learning NFAs directly is that they 
can provide exponentially smaller automata when compared 
to their deterministic counterpart. This can be seen both as a 
generalization and as an optimization of the algorithm. 


An implementation using our recently developed Haskell li- 
brary tailored to nominal computation — NLambda [26]. Our 
implementation is the first non-trivial application of a novel pro- 
gramming paradigm of functional programming over infinite 


L* LEARNER 
1 S,E¢€ {e} 
2 repeat 
3 while (S, E) is not closed or not consistent 
4 if (S, E) is not closed 
5 find sı € S, a € A such that 


row(sia) # row(s), for all s € S 
6 S + SU {sia} 
7 if (S, E) is not consistent 
find s1, s2 € S, a € A, and e E€ E such that 
row(s1) = row(s2) and L(s1ae) # L(s2ae) 


9 E + EU {ae} 
10 Make the conjecture M (S, E) 
11 if the Teacher replies no, with a counter-example t 
12 S « S Uprefixes(t) 


13 until the Teacher replies yes to the conjecture M (S, Æ). 
14 return M(S, E) 


Figure 1. Angluin’s algorithm for deterministic finite automata 


structures, which allows the programmer to rely on convenient 

intuitions of searching through infinite sets in finite time. 
The paper is organized as follows. In Section [2] we present an 
overview of our contributions (and the original algorithm) highlight- 
ing the challenges we faced in the various steps. In Section [8] we 
revise some basic concepts of nominal sets and automata. Section fä] 
contains the core technical contributions of our paper: The new 
algorithm and proof of correctness. In Section [5] we describe an 
algorithm to learn nominal non-deterministic automata. Section[6] 
contains a description of NLambda, details of the implementation, 
and results of preliminary experiments. Section [7] contains a dis- 
cussion of related work. We conclude the paper with a discussion 
section where also future directions are presented. 


2. Overview of the Approach 


In this section, we give an overview of the work developed in the 

paper through examples. We will start by explaining the original 

algorithm for regular languages over finite alphabets, and then 

explain the challenges in extending it to nominal languages. 
Angluin’s algorithm L* provides a procedure to learn the minimal 

DFA accepting a certain (unknown) language £. The algorithm has 

access to a teacher which answers two types of queries: 

e membership queries, consisting of a single word w € A*, to 

which the teacher will reply whether w € £ or not; 


© equivalence queries, consisting of a hypothesis DFA H, to 
which the teacher replies yes if £(H) = £, and no otherwise, 
providing a counterexample w E€ L(H)AC (A denotes the 
symmetric difference of two languages). 
The learning algorithm works by incrementally building an observa- 
tion table, which at each stage contains partial information about the 
language £. The algorithm is able to fill the table with membership 
queries. As an example, and to set notation, consider the following 
table (over A = {a, b}). 


This table indicates that £ contains at least aa and definitely 
does not contain the words <€, a, b, ba, baa, aaa. Since row is fully 


determined by the language £, we will from now on refer to an 
observation table as a pair (S, Æ), leaving the language £ implicit. 
Given an observation table (S, E) one can construct a determin- 
istic automaton M (S, E) = (Q, qo, 6, F) where 
e Q = {row(s) | s € S} isa finite set of states; 
e F = {row(s) | s E€ S,row(s)(e) = 1} C Q is the set of final 
states; 
è qo = row(e) is the initial state; 
eô: Q x A — Q is the transition function given by 
d(row(s),a) = row(sa). 
For this to be well-defined, we need to have e € S (for the initial 
state) and € € E (for final states), and for the transition function 
there are two crucial properties of the table that need to hold: 
Closedness and consistency. An observation table (S, E) is closed 
if for allt € S-A there exists an s € S such that row(t) = row(s). 
An observation table (S, E) is consistent if, whenever sı and s2 
are elements of S such that row(s1) = row(s2), for alla € A, 
row(sia) = row(s2a). Each time the algorithm constructs an 
automaton, it poses an equivalence query to the teacher. It terminates 
when the answer is yes, otherwise it extends the table with the 
counterexample provided. 


2.1 Simple Example of Execution 


Angluin’s algorithm is displayed in Figure Throughout this 
section, we will consider the language(s) 

Ln = {ww | w € A*, |w| = n} 
If the alphabet A is finite then £n is regular for any n € N, and 
there is a finite DFA accepting it. 

The language Lı = {aa, bb} looks trivial, but the minimal DFA 
recognizing it has as many as 5 states. Angluin’s algorithm will 
terminate in (at most) 5 steps. We illustrate some relevant ones. 
Step 1. We start from S, E = {e}, and we fill the entries of the 
table below by asking membership queries for €, a and b. The table 
is closed and consistent, so we construct the hypothesis A1. 


Ai = +) Da. 


qo = row(e) = {eH 0} 


Q 
saol ala 


The Teacher replies no and gives the counterexample aa, which is in 
Lı but it is not accepted by Aı. Therefore, line 12 of the algorithm 
is triggered and we set S +— S U {a, aa}. 

Step 2. The table becomes the one on the left below. It is closed, 
but not consistent: Rows € and a are identical, but appending a leads 
to different rows, as depicted. Therefore, line 9 is triggered and an 
extra column a, highlighted in red, is added. The new table is closed 
and consistent and a new hypothesis A2 is constructed. 


€ —_— 
E 0: € 0 0 

aÇ als 05 oa 
3 :)a 
aa | 1 aa |1 0 
b | 0 D [0 0 
ab | 0 ab |0 0 
aaa | 0 aaa| 0 0 
aab | 0 aab| 0 0 


The Teacher again replies no and gives the counterexample bb, 
which should be accepted by Az but it is not. Therefore we put 
S + SU {b, bb}. 

Step 3. The new table is the one on the left. It is closed, but € and 
b violate consistency, when b is appended. Therefore we add the 
column b and we get the table on the right, which is closed and 
consistent. The new hypothesis is A3. 


e a b 

€ 0 0 0 

a 0 1 0 

aa 1 0 0 

b 0 0 1 

bb 1 0 0 

0 0 ab 0 0 0 

aaa| 0 0 aaa} O 0 0 

aab| O 0 aab| 0 0 0 

ba 0 0 ba 0 0 0 

bba |O 0 ba | 0 0 0 

bbb | 0 0 bbb | 0 0 0 
The Teacher replies no and provides the counterexample babb, so 

S<« SU {ba, bab}. 


Step 4. One more step brings us to the correct hypothesis A4 
(details are omitted). 
a a 
= —>(q) 
D T A 


Ag 


eo 


2.2 Learning Nominal Languages 


Consider now an infinite alphabet A {a,b,c,d,...}. The 
language Lı becomes {aa, bb, cc,dd,...}. Classical theory of 
finite automata does not apply to this kind of languages, but one 
may draw an infinite deterministic automaton that recognizes £1 in 
the standard sense: 


As = -ek 


oO 


where “> and Ž% stand for the infinitely-many transitions labelled 
by elements of A and A \ {a}, respectively. This automaton is 
infinite, but it can be finitely presented in a variety of ways, for 
example: 


VaEA 
KOMONO ODA 
Eg 


One can formalize the quantifier notation above (or indeed the 
“dots” notation above that) in several ways. A popular solution is 
to consider finite register automata [18] [25], i.e., finite automata 
equipped with a finite number of registers where alphabet letters 
can be stored and later compared for equality. Our language £1 is 


a) 


recognized by a simple automaton with four states and one register. 


The problem of learning registered automata has been successfully 
attacked before [2I]. 

In this paper, however, we will consider nominal automata [9] 
instead. These automata ostensibly have infinitely many states, but 
the set of states can be finitely presented in a way open to effective 
manipulation. More specifically, in a nominal automaton the set of 
states is subject to an action of permutations of a set A of atoms, and 
it is finite up to that action. For example, the set of states of As is: 


{q0, q3, qa} U {qa | ae A} 
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and it is equipped with a canonical action of permutations 7: A > 
A that maps every qa to qr., and leaves qo, q3 and qa fixed. Techni- 
cally speaking, the set of states has four orbits (one infinite orbit and 
three fixed points) of the action of the group of permutations of A. 
Moreover, it is required that in a nominal automaton the transition 
relation is equivariant, i.e., closed under the action of permutations. 
The automaton As has this property: For example, it has a transi- 
tion qa —> q3, and for any 7: A — A there is also a transition 


(qa) = qrt) É qs = (93). 

Nominal automata with finitely many orbits of states are equi- 
expressive with finite register automata [9], but they have an im- 
portant theoretical advantage: They are a direct reformulation of 
the classical notion of finite automaton, where one replaces finite 
sets with orbit-finite sets and functions (or relations) with equivari- 
ant ones. A research programme advocated in [8][9] is to transport 
various computation models, algorithms and theorems along this 
correspondence. This can often be done with remarkable accuracy, 
and our paper is a witness to this. Indeed, as we shall see, nominal 
automata can be learned with an algorithm that is almost a verbatim 
copy of the classical Angluin’s one. 

Indeed, consider applying Angluin’s algorithm to our new lan- 
guage £1. The key idea is to change the basic data structure: Our 
observation table (S, Æ) will be such that S and E are equivari- 
ant subsets of A*, i.e., they are closed under the canonical action 
of atom permutations. In general, such a table has infinitely many 
rows and columns, so the following aspects of the algorithm seem 
problematic: 


line[3} closedness and consistency tests range over infinite sets; 


line[5]and[8} finding witnesses for closedness or consistency viola- 
tions potentially require checking all infinitely many rows; 


line[I2} every counterexample t has only finitely many prefixes, so 
it is not clear how one would construct an infinite set S in finite 
time. However, an infinite S is necessary for the algorithm to 
ever succeed, because no finite automaton recognizes L1. 


At this stage, we need to observe that due to equivariance of S, Æ 
and £1, the following crucial properties hold: 


(P1) the sets S, S-A and E admit a finite representation up to 
permutations; 


(P2) the function row is such that row(m(s))(7(e)) = row(s)(e), 
for all s € S and e € E, so the observation table admits a finite 
symbolic representation. 


Intuitively, checking closedness and consistency, and finding a wit- 
ness for their violations, can be done effectively on the represen- 
tations up to permutations (P1). This is sound, as row is invariant 
w.r.t. permutations (P2). 

We now illustrate these points through a few steps of the 
algorithm for £1. 
Step 1’: We start from S, E = {e}. We have S-A = A, which 
is infinite but admits a finite representation. In fact, for any a € 
A, we have A = {zx(a) | a isa permutation}. Then, by (P2), 
row(m(a))(e) = row(a)(e) = 0, for all 7, so the first table can be 


written as: 
- +@>+ 


It is closed and consistent. Our hypothesis is A1, where 
ôa (row(e), x) = row(x) = qo, for all x € A. As in Step 1, 
the Teacher replies with the counterexample aa. 


€ 


a 


Aj 


alle 


Step 2’. By equivariance of £1, the counterexample tells us that all 
words of length 2 with two repeated letters are accepted. Therefore 


we extend S with the (infinite!) set of such words. The new symbolic 
table is: 


€ 
e | 0° 
a 0 
aa 1 
ab 0- 
aaa | 0 
aab | 0 


The lower part stands for elements of S- A. For instance, ab stands 
for words obtained by appending a fresh letter to words of length 1 
(row a). It can be easily verified that all cases are covered. Notice 
that the table is different from that of Step 2: A single b is not in 
the lower part, because it can be obtained from a via a permutation. 
The table is closed. 

Now, for consistency we need to check row(ex) = row(az), 
for alla, x € A. Again, by (P2), it is enough to consider rows of the 
table above. Consistency is violated, because row(a) 4 row(aa). 
We found a “symbolic” witness a for such violation. In order to fix 
consistency, while keeping equivariant, we need to add columns 
for all 7(a). The resulting table is 


c€ a b c 
€ 0 0 0 0 
a 0 1 0 0 
aa 1 0 0 O 
ab 0 0 0 0 
aaaļ| O0 0 0 0 
aab O0 0 0 0 . 


where non-specified entries are 0. Only finitely many entries of 
the table are relevant: row(s) is fully determined by its values on 
letters in s and on just one letter not in s. For instance, we have 
row(a)(a) = 1 and row(a)(a’) = 0, for all a’ € A \ {a}. The 
table is trivially consistent. 

Notice that this step encompasses both Step 2 and 3, because 
the rows b and bb added by Step 2 are already represented by a and 
aa. The hypothesis automaton is 


A 


OCS z Vee A 


This is again incorrect, but one additional step will give the correct 
hypothesis automaton, shown earlier in {i). 


Ab 


2.3 Generalization to Non-Deterministic Automata 


Since our extension of Angluin’s L* algorithm stays close to her 
original development, exploring extensions of other variations of L* 
to the nominal setting can be done in a systematic way. We will show 
how to extend the algorithm NL* for learning NFAs by Bollig et al. 
. This has practical implications: It is well-known that NFAs 
are exponentially more succinct than DFAs. This is true also in the 
nominal setting. However, there are challenges in the extension that 
require particular care. 

e Nominal NFAs are strictly more expressive than nominal DFAs. 
We will show that the nominal version of NL* terminates for 
all nominal NFAs that have a corresponding nominal DFA and, 
more surprisingly, that it is capable of learning some languages 
that are not accepted by nominal DFAs. 


Language equivalence of nominal NFAs is undecidable. This 
does not affect the correctness proof, as it assumes a teacher 
which is able to answer equivalence queries accurately. For our 
implementation, we will describe heuristics that produce correct 
results in many cases. 

For the learning algorithm the power of non-determinism means that 
we can make some shortcuts during learning: If we want to make the 
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table closed, we were previously required to find an equivalent row 
in the upper part; now we may find a sum of rows which, together, 
are equivalent to an existing row. This means that in some cases 
fewer rows will be added for closedness. 


3. Preliminaries 


We recall the notions of nominal sets, nominal automata and nominal 
regular languages (see [9] for a detailed account). 

Let A be a countable set and let Perm/(A) be the set of permu- 
tations on A, i.e., the bijective functions 7: A — A. Permutations 
form a group where the identity permutation id is the unit element, 
inverse is functional inverse and multiplication is function composi- 
tion. 

A nominal set is a set X plus a function -: Perm(A) x 
X — X, interpreting permutations over X. Such function must 
be a group action of Perm(A), i.e., it must satisfy id - x = x and 
T- (T -x)= (m on) - x. We say that a finite A C A supports 
x € X whenever, for all m acting as the identity on A, we have 
m- x = x. In other words, permutations that only move elements 
outside A do not affect x. The support of x € X, denoted supp(z), 
is the smallest finite set supporting x. We require nominal sets to 
have finite support, meaning that supp(z) exists for all z € X. 

The orbit orb(x) of x € X is the set of elements in X reachable 
from x via permutations, explicitly 


orb(x) = {r -x | a € Perm(A)} 


Then X is orbit-finite whenever it is a union of finitely many orbits. 

Given a nominal set X, a subset Y C X is equivariant if it 
is preserved by permutations, i.e., m: y € Y, forall y € Y. In 
other words, Y is the union of orbits of X. This definition extends 
to the notion of an equivariant relation R C X x Y, by setting 
T- (x,y) = (T - x,T - y), for (x,y) € R; similarly for relations of 
greater arity. The dimension of nominal set X is the maximal size of 
supp(x), for any x € X. Every orbit-finite set has finite dimension. 

We define A“) = {(a1,...,ap) | ai Æ aj fori Æ j}. For 
every single-orbit nominal set X with dimension k, there is a 
surjective equivariant map 


fx: AM >X. 


This map can be used to get an upper bound for the number of orbits 
of Xı X---x Xn, for X; a nominal set with l; orbits and dimension 
ki. Suppose O; is an orbit of X;. Then we have a surjection 


AED ae x Aen) TAn x “+X On 
stipulating that the codomain cannot have more orbits than the 
domain. Let fa ({k:}) denote the number of orbits of A“) x --- x 
AČ»), for any finite sequence of natural numbers {k;}. We can 
form at most l = lıl2 . . . ln tuples of the form O1 x --- X On, so 
Xı X +++ X Xn has at most lfa(ki,..., kn) orbits. 

For X single-orbit, the local symmetries are defined by the group 
{9 © Sk | f(v1,-.., 0%) = faai ta) for all x; € X}, 
where k is the dimension of X and Sx is the symmetric group of 
permutations over k distinct elements. 

NFAs on sets have a finite state space. We can define nominal 
NFAs, with the requirement that the state space is orbit-finite and 
the transition relation is equivariant. A nominal NFA is a tuple 
(Q, A, Qo, F,6), where: 

e Q is an orbit-finite nominal set of states; 


e Ais an orbit-finite nominal alphabet; 
e Qo, F C Q are equivariant subsets of initial and final states; 


e ô CQ x Ax Q is an equivariant transition relation. 
A nominal DFA is a special case of nominal NFA where Qo = {qo} 
and the transition relation is an equivariant function ô: Q x A > Q. 


Equivariance here can be rephrased as requiring 6(7 - q, 7 - a) 
a - 0(q, a). In most examples we take the alphabet to be A = A, but 
it can be any orbit-finite nominal set. For instance, A = Act x A, 
where Act is a finite set of actions, represents actions act(x) with 
one parameter x € A (actions with arity n can be represented via 
n-fold products of A). 

A language £ is nominal regular if it is recognized by a 
nominal DFA. The theory of nominal regular languages recasts 
the classical one using nominal concepts. A nominal Myhill-Nerode- 
style syntactic congruence is defined: w, w’ € A* are equivalent 
w.r.t. L, written w =c w’, whenever 


wv EL 4 wveEel 


for all v € A*. This relation is equivariant and the set of equivalence 
classes [w] s is a nominal set. 


Theorem 1 (Myhill-Nerode theorem for nominal sets [9]]). Let 
L be a regular nominal language. The following conditions are 
equivalent: 

1. the set of equivalence classes of =c is orbit-finite; 

2. L is recognized by a nominal DFA. 


Unlike what happens for ordinary regular languages, nominal 
NFAs and nominal DFAs are not equi-expressive. Here is an example 
of a language accepted by a nominal NFA, but not by a nominal 
DFA: 


Leq = {a1 ... an | ai = aj, for some i < j € {1,...,n}} 


In the theory of nominal regular languages, several problems are de- 
cidable: Language inclusion and minimality test for nominal DFAs. 
Moreover, orbit-finite nominal sets can be finitely-represented, and 
so can be manipulated by algorithms. This is the key idea underpin- 
ning our implementation. 


3.1 Different Atom Symmetries 


An important advantage of nominal set theory as considered in [9] 
is that it retains most of its properties when the structure of atoms A 
is replaced with an arbitrary infinite relational structure subject to a 
few model-theoretic assumptions. An example alternative structure 
of atoms is the total order of rational numbers (Q, <), with the 
group of monotone bijections of Q taking the role of the group of 
all permutations. The theory of nominal automata remains similar, 
and an example nominal language over the atoms (Q, <) is: 


{ay ...dn | ai < aj, for some i < j € {1,...,n}} 


which is recognized by a nominal DFA over those atoms. 

To simplify the presentation, in this paper we concentrate on the 
“equality atoms” only. Also our implementation of nominal learning 
algorithms is restricted to equality atoms. However, both the theory 
and the implementation can be generalized to other atom structures, 
with the “ordered atoms” (Q, <) as the simplest other example. We 
leave the details of this for a future extended version of this paper. 


4. Angluin’s Algorithm for Nominal DFAs 


In our algorithm, we will assume a teacher as described at the start 
of Section[2] In particular, the teacher is able to answer membership 
queries and equivalence queries, now in the setting of nominal 
languages. We fix a target language £, which is assumed to be a 
nominal regular language. 

The learning algorithm for nominal automata, vL*, will be very 
similar to L* in Figure{]] In fact, we only change the following lines: 


6 S+ SUorb(sa) 
9% E+ EUVorb(ae) (2) 
12° S 4 SUprefixes(orb(t)) 


617 


The basic data structure is an observation table (S, E, T) where S 
and E are orbit-finite subsets of A* and T : S U S-A x E > 2 
is an equivariant function defined by T(se) = L(se) for each 
s E€ SUS-Aande E E. Since T is determined by £ we omit it 
from the notation. Let row : S U S-A — 2” denote the curried 
counterpart of T. Let u ~ v denote the relation row(u) = row(v). 


Definition 1. The table is called closed if for each t € S-A there is 
as € Switht ~ s. The table is called consistent if for each pair 
81,82 € S with sı ~ s2 we have sia ~ sea for alla € A. 


The above definitions agree with the abstract definitions given in 
and we may use some of their results implicitly. The intuition 
behind the definitions is as follows. Closedness assures us that for 
each state we have a successor state for each input. Consistency 
assures us that each state has at most one successor for each input. 
Together it allows us to construct a well-defined minimal automaton 
from the observations in the table. 

The algorithm starts with a trivial observation table and tries to 
make it closed and consistent by adding orbits of rows and columns, 
filling the table via membership queries. When the table is closed 
and consistent it constructs a hypothesis automaton and poses an 
equivalence query. 

The pseudocode for the nominal version is the same as listed in 
Figure[]| modulo the changes displayed in (2). However, we have to 
take care to ensure that all manipulations and tests on the (possibly) 
infinite sets S, and A terminate in finite time. We refer to [9] 
and for the full details on how to represent these structures 
and provide a brief sketch here. The sets S, Æ, A and S-A can be 
represented by choosing a representative for each orbit. The function 
T in turn can be represented by cells T; j : orb(s;) x orb(e;) > 2 
for each representative s; and e;. Note, however, that the product 
of two orbits may consist of several orbits, so that T;,; is not a 
single boolean value. Each cell is still orbit-finite and can be filled 
with only finitely many membership queries. Similarly the curried 
function row can be represented by a finite structure. 

To check whether the table is closed, we observe that if we have 
a corresponding row s € S for some t € S-A, this holds for any 
permutation of t. Hence it is enough to check the following: For 
all representatives t € S-A there is a representative s € S with 
row(t) = 7 - row(s) for some permutation 7. Note that we only 
have to consider finitely many permutations, since the support is 
finite and so we can decide this property. Furthermore if the property 
does not hold, we immediately find a witness represented by t. 

Consistency is a bit more complicated, but it is enough to 
consider the set of inconsistencies, {(s1,82,a,e) | row(s1) 
row(s2) A row(sia)(e) # row(sz2a)(e)}. It is an equivariant 
subset of S x S x A x E and so it is orbit-finite. Hence we can 
decide emptiness and obtain representatives if it is non-empty. 

Constructing the hypothesis happens in the same way as before 
(Section[2}, where we note the state space is orbit-finite since it is 
a quotient of S. Moreover the function row is equivariant, so all 
structure (Qo, F and ô) is equivariant as well. 

The representation given above is not the only way to represent 
nominal sets. For example, first-order definable sets can be used as 
well [26]. From now on we assume to have set theoretic primitives 
so that each line in Figure[l]is well defined. 


4.1 Correctness 


To prove correctness we only have to prove that the algorithm 
terminates, that is, only finitely many hypotheses will be produced. 
Correctness follows trivially from termination since the last step 
of the algorithm is an equivalence query to the teacher inquiring 
whether an hypothesis automaton accepts the target language. We 
start out by listing some facts about observation tables. 


Lemma 1. The relation ~ is an equivariant equivalence relation. 
Furthermore, for all u,v € S we have that u =c v implies u ~ v. 


LemmafI] implies that at any stage of the algorithm the num- 
ber of orbits of S'/~ does not exceed the number of orbits of the 
minimal acceptor with state space A*/=c (recall that =¢ is the 
nominal Myhill-Nerode equivalence relation). Moreover, the follow- 
ing lemma shows that the dimension of the state space never exceeds 
the dimension of the minimal acceptor. Recall that the dimension is 
the maximal size of the support of any state, which is different than 
the number of orbits. 


Lemma 2. We have supp([u]~) C supp([uj=,) C supp(u) for 
allu € S. 


Lemma 3. The automaton constructed from a closed and consistent 
table is minimal. 


Proof. Follows from the categorical perspective given in [24]. 


We note that the constructed automaton is consistent with the 
table (we use that the set S is prefix-closed and Æ is suffix-closed 
BI). The following lemma shows that there are no strictly “smaller” 
automata consistent with the table. So the automaton is not just 
minimal, it is minimal w.r.t. the table. 


Lemma 4. Let H be the automaton associated with a closed and 
consistent table (S, E). If M’ is an automaton consistent with 
(S, E) (meaning that se € L(M') => se € L(A) for all 
s € SU S-Aande € E) and M’ has at most as many orbits as H, 
then there is a surjective map f : Qu' — Qu. If moreover 
e M's dimension is bounded by the dimension of H, i.e. 
supp(m) C supp(f(m)) for all Q'm, and 
e M' has no fewer local symmetries than H, i.e. n- f(m) = f(m) 
implies x -m = m for allm E€ Q, 
then f defines an isomorphism M’ = H of nominal DFAs. 


Proof. (All maps in this proof are equivariant.) Define a map row’ : 
Qh, — 2” by restricting the language map Qy > 24° to E. 
First, observe that row’ (6’(qo, s)) = row(s) for alls € S U S-A, 
since € € E and M” is consistent with the table. Second, we have 
{row’(5'(qo, 8))|s € S} C {row'(qg)|q E€ M’}. 

Let n be the number of orbits of H. The former set has n 
orbits by the first observation, the latter set has at most n orbits by 
assumption. We conclude that the two sets (both being equivariant) 
must be equal. That means that for each q € M’ there is a 
s € S such that row’(q) = row(s). We see that row’ : M’ —> 
{row (ô (qo, 8))|s} = H is a surjective map. Since a surjective 
map cannot increase the dimensions of orbits and the dimensions 
of M’ are bounded, we note that the dimensions of the orbits in 
H and M’ have to agree. Similarly, surjective maps preserve local 
symmetries. This map must hence be an isomorphism of nominal 
sets. Note that row’ (q) = row’ (6’(q6, s)) implies q = 6’(q6, 8). 

It remains to prove that it respects the automaton structures. It 
preserve the initial state: row’(qo) = row(6'(qo,€)) = row(e). 
Now let q E€ M’ be a state and s € S such that row’(q) = row(s). 
It preserves final states: q E€ F” <=> row'(g)(e)=1 <=> 
row(s)(€) = 1. Finally, it preserves the transition structure: 


row (6 (q,a)) = row’ (ô (8 (qo, 8), a)) = row’ (8 (qb, sa)) 


= row(sa) = 6(row(s), a) H 


The above proof is an adaptation of Angluin’s proof for automata 
over sets. We will now prove termination of the algorithm by proving 
that all steps are productive. 


Theorem 2. The algorithm terminates and is hence correct. 
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Proof. Provided that the if-statements and set operations terminate, 
we are left proving that the algorithm adds (orbits of) rows and 
columns only finitely often. We start by proving that a table can be 
made closed and consistent in finite time. 

If the table is not closed, we find a row sı € S-A such that 
row(s1) # row(s) for all s € S. The algorithm then adds the orbit 
containing sı to S. Since sı was nonequivalent to all rows, we find 
that S U orb(t)/~ has strictly more orbits than S/~. Since orbits 
of S/ ~ cannot be more than those of A* /=z, this happens finitely 
often. 

Columns are added in case of an inconsistency. Here the algo- 
rithm finds two elements s1,s2 E€ S with row(s1) row(s2) 
but row(siae) #4 row(szae) for some a € A ande E E. 
Adding ae to E will ensure that row’(s1) 4 row’(s2) (row’ is 
the function belonging to the updated observation table). If the 
two elements row’ (si), row’ (s2) are in different orbits, the num- 
ber of orbits is increased. If they are not in the same orbit, we 
have row’(s2) = m - row’(s1) for some permutation 7. Using 
row(s1) = row(s2) and row'(s1) Æ row’(s2) we have: 


row(s1) = 7- row(s1) row (s1) £ T: row’ (s1) 


Consider all such m and suppose there is a m and x € 
supp(row(si)) such that 7 -x ¢ supp(row(si)). Then we 
find that m -x € supp(row’(si)), and so the support of the 
row has grown. By Lemma [2] this happens finitely often. Sup- 
pose such m and x do not exist, then we consider the finite 
group R = {plsupp([s;]~) | Tow(s1) = p : row(s1)}. We see that 
{p|supp([s,]~) | Tow’ (s1) = p-row’(s1)} is a proper subgroup of R. 
So, adding a column in this case decreases the size of the group R, 
which can happen only finitely often. In this case a local symmetry 
is removed. 

In short, the algorithm will succeed in producing a hypothesis 
in each round. It remains to prove that it needs only finitely many 
equivalence queries. 

Let (S, E) be the closed and consistent table and H its corre- 
sponding hypothesis. If it is incorrect a second hypothesis H’ will 
be constructed which is consistent with the old table (S, Æ). The 
two hypotheses are nonequivalent, as H’ will handle the counter 
example correctly and H does not. Therefore, H’ will have at least 
one orbit more, one local symmetry less, or one orbit will have 
strictly bigger dimension (Lemmal4}, all of which can only happen 
finitely often. 


We remark that all the lemmas and proofs as above are close to 
the original ones of Angluin. However, two things are crucially 
different. First, adding a column does not always increase the 
number of (orbits of) states. It can happen that by adding a column a 
bigger support is found or that a local symmetry is broken. Second, 
the new hypothesis does not necessarily have more states, again it 
might have bigger dimensions or less local symmetries. 

From the proof Theorem [2] we observe moreover that the way 
we handle counterexamples is not crucial. Any other method which 
ensures a non-equivalent hypothesis will work. In particular our 
algorithm is easily adapted to include optimizations such as the ones 
in and , where counterexamples are added as columns. 


4.2 Example 


Consider the target automaton in Figure[2]and an observation table 
Tı at some stage during the algorithm. We remind the reader that the 
table is represented in a symbolic way: The sequences in the rows 
and columns stand for whole orbits and the cells denote functions 
from the product of the orbits to 2. Since the cells can consist of 
multiple orbits, where each orbit is allowed to have a different value, 
we use a formula to specify which orbits have a 1. 


To É a’ 
€ 0 0 
1 aa 
Ti E a 0 i else 
€ 0 
a 0 1 a’ Fab 
ab 1 ab 1 {0 else 
aa | O aa |0 0 
aba | 0 aba| 0 0 
abb | 0 1 alta 
abe | 1 abb | 0 {0 oT 
1 da Fab 
abè |1 {i else 


T3 b'a! 

€ 1 
1 a#d,b 

a i else 

ab 1 (b 4a,bAd #ab) V(O =bAd sa) 
0 else 

aa 1 

aba | 1 
1 a#ad,b 

abb {i else 

aibe 1 (b 4abAd +a,b) V(b =bAd sa) 
0 else 


Figure 2. Example automaton to be learnt and three subsequent tables computed by vL*. In the automaton, x, y, z denote distinct atoms. In 


T3 we only show a relevant column. 


The table T; at some stage of the algorithm has to be checked for 
closedness and consistency. We note that it is definitely closed. For 


consistency we check the rows row(e) and row(a) which are equal. 


Observe, however, that row(eb)(€) = 0 and row(ab)(e) = 1, so 
we have an inconsistency. The algorithm adds the orbit orb(b) as 
column and extends the table, obtaining T2. We note that, in this 


process, the number of orbits did grow, as the two rows are split. 


Furthermore we see that both row(a) and row(ab) have empty 
support in T}, but not in T2, because row(a)(a’) depends on a’ 
being equal or different from a, similarly for row(ab)(a’). 

The table T> is still not consistent as we see that row(ab) = 
row(ba) but row(abb)(c) = 1 and row(bab)(c) = 0. Hence the 
algorithm adds the columns orb(bc), obtaining table T3. We note 
that in this case, no new orbits are obtained and no support has 
grown. In fact, the only change here is that the local symmetry 
between row(ab) and row(ba) is removed. This last table, T3, is 
closed and consistent and will produce the correct hypothesis. 


4.3 Query Complexity 


In this section, we will analyse the number of queries made by the 
algorithm in the worst case. Let M be the minimal target automaton 
with n orbits and of dimension k. We will use log in base two. 
Lemma 5. The number of equivalence queries En p is 
O(nk log k). 


Proof. By Lemma [4] each hypothesis will be either 1) bigger in 
the number of orbits, which is bounded by n, or 2) bigger in the 
dimension of an orbit, which is bounded by k or 3) smaller in local 
symmetries of an orbit. For the last part we want to know how long a 
subgroup series of the permutation group Sx can be. This is bounded 
by the number of divisors of k!, as each subgroup divides the order 
of the group. We can easily bound the number of divisors of any m 
by log m and so one can at take a subgroup at most k log k times 
when starting with Sx. 

Since the hypothesis will grow monotonically in the number of 
orbits and for each orbit will grow monotonically w.r.t. the remaining 
two dimensions, the number of equivalence queries is bound by 
n+n(k + klog k). 


Next we will give a bound for the size of the table. 


Lemma 6. The table has at most n + mMEn,~ orbits in S with 
sequences of at most length n + m, where m is the length of the 
longest counter example given by the teacher. The table has at most 
n(k + klog k + 1) orbits in E of at most length n(k + k log k + 1) 


Proof. In the termination proof we noted that rows are added at most 
n times. In addition (all prefixes of) counter examples are added as 
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rows which add another mEn, rows. Obviously counter examples 
are of length at most m and are extended at most n times, making 
the length at most m + n in the worst case. 

For columns we note that one of three dimensions approaches a 
bound similarly to the proof of Lemma|5] So at most n(k +k log k + 
1) columns are added. Since they are suffix closed, the length is at 
most n(k + klogk +1). 


Let p and l denote respectively the dimension and the number of 
orbits of A. 


Lemma 7. The number of orbits in the lower part of the table, S.A, 
is bounded by (n+ mMEn,x)l fa(p(n + m), p). 


Proof. Any sequence in S is of length at most n+m, so it contains at 
most p(n+m) distinct atoms. When we consider S.A, the extension 
can either reuse atoms from those p(n +m), or none at all. Since the 
extra letter has at most p distinct atoms, the set AP?("+™) x A) 
gives a bound f,(p(n+m), p) for the number of orbits of Os x Oa, 
with Ox an orbit of X. Multiplying by the number of such ordered 
pairs, namely (n + mE,,,;,)l, gives a bound for S-A. 


Let Chem = (n + mEn,n)(lfa(p(n + m),p) + 1)n(k + 
klog k + 1) be the maximal number of cells in the table. We note 
that this number is polynomial in k, l, m and n but not in p. 


Corollary 1. The number of membership queries is bounded by 
Cn,k,mfalp(n + m), pn(k + klogk + 1)). 


5. Learning Non-Deterministic Nominal 
Automata 


In this section, we introduce a variant of vL*, which we call vNL*, 
where the learnt automaton is non-deterministic. It will be based 
on NL* [l], an Angluin-style algorithm for learning NFAs. The 
algorithm is shown in Figure [3] We first illustrate NL*, then we 
discuss its extension to nominal automata. 

NL* crucially relies on the use of residual finite-state automata 
(RFSA) [19], which are NFAs admitting unique minimal canonical 
representatives. The states of this automaton correspond to Myhill- 
Nerode right-congruence classes, but can be exponentially smaller 
than the corresponding minimal DFA: Composed states, language- 
equivalent to sets of other states, can be dropped. The algorithm NL* 
equips the observation table (S5, Æ) with a union operation, allowing 
for the detection of composed and prime rows. 


Definition 2. Let (row(s1) U row(s2))(e) row(s1)(e) V 
row(si)(e) (regarding cells as booleans). This operation in- 
duces an ordering between rows: row(s1) E row(s2) whenever 
row(s1)(e) = 1 implies row(s2)(e) = 1, for alle € E. 


NL* LEARNER 


1 S,E + {e} 

2 repeat 

3 while (S, E) is not RFSA-closed or not RFSA-consistent 
4 if (S, E) is not RFSA-closed 

5 find s € S,a € A such that 


row(sa) € PR(S, E) \ PR! (S, E) 


6 S + SU {sa} 
7 if (S, E) is not RFSA-consistent 
8 find s1, s2 E€ S, a € A, and e E€ E such that 
row(s1) E row(s2) and 
L(sıae) = 1, L(s2ae) = 0 
9 E + EU {ae} 
10 Make the conjecture N (S, Æ) 
11 if the Teacher replies no, with a counter-example t 
12 E + E U suffixes(t) 
13 until the Teacher replies yes to the conjecture N (S, Æ). 
14 return N(S, E) 


Figure 3. Bollig et al.’s algorithm for learning NFAs 
A row row(s) is composed if row(s) = row(sı) U...U 
row(Sn), for row(s;) # row(s). Otherwise it is prime. We denote 
by PR! (S, E) the rows in the top part of the table (ranging over S) 
which are prime w.r.t. the whole table (not only w.r.t. the top part). 
We write PR(S, E) for all the prime rows of (S, Æ). 

As in L*, states of hypothesis automata will be rows of (S, Æ) 
but, as the aim is to construct a minimal RFSA, only prime rows are 
picked. New notions of closedness and consistency are introduced, 
to reflect features of RFSAs. 


Definition 3. A table (5, E) is: 
e RFSA-closed if, for allt € S-A, row(t) = | Hrow(s) 
PR! (S, E) | row(s) E row(t)}; 
¢ RFSA-consistent if, for all s1, s2 E S and a € A, row(s1) E 
row(s2) implies row(sı1a) E row(s2a). 


= 


If (S, E) is not RFSA-closed, then there is a row in the bottom 
part of the table which is prime, but not contained in the top 
part. This row is then added to S (line|5). If (S, E) is not RFSA- 
consistent, then there is a suffix which does not preserve the 
containment of two existing rows, so those rows are actually 
incomparable. A new column is added to distinguish those rows 
(line[8). Notice that counterexamples supplied by the teacher are 
added to columns (line[12). Indeed, in it is shown that treating 
the counterexamples as in the original L*, namely adding them to 
rows, does not lead to a terminating algorithm. 


Definition 4. Given a RFSA-closed and RFSA-consistent table 
(S, E), the conjecture automaton is N(S, E) (Q, Qo, F, ô), 
where: 

© Q = PR" (S, E): 

e Qo = {re Q| rE row(e)}; 

eF={reQ|r(e)=1}; 

e the transition relation ô C Q x A x Q is given by 

d(row(s),a) = {r € Q | r E row(sa)}. 


As observed in [11], N(S, E) is not necessarily a RFSA, but it 
is a canonical RFSA if it is consistent with (S, Æ). If the algorithm 
terminates, then N (S, E) must be consistent with (S, Æ), which 
ensures correctness. The termination argument is more involved 
than L*, but still it relies on the minimal DFA. 

Developing an algorithm to learn nominal NFAs is not an obvious 
extension of NL*: Non-deterministic nominal languages strictly 
contain nominal regular languages, so it is not clear what the 
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developed algorithm should be able to learn. To deal with this, we 
introduce a nominal notion of RFSAs. They are a proper subclass of 
nominal NFAs, because they recognize nominal regular languages. 
Nonetheless, they are more succinct than nominal DFAs. 


5.1 Nominal Residual Finite-State Automata 


Let £ be a nominal regular language and let u be a finite string. 
The derivative of L wrt. u is u™tL = {v € A* | w € LEA 
set L’ C A* is a residual of L if there is u with L’ = u™ +£. Note 
that a residual might not be equivariant, but it does have a finite 
support. We write R(L) for the set of residuals of £. Residuals form 
an orbit-finite nominal set: They are in bijection with the state-space 
of the minimal nominal DFA for £. 

A nominal residual finite-state automaton for £ is a nominal 
NFA whose states are subsets of such minimal automaton. Given a 
state q of an automaton, we write £(q) for the set of words leading 
from q to a set of states containing a final one. 


Definition 5. A nominal residual finite-state automaton (nominal 
RFSA) is a nominal NFA A such that L(q) € R(L(A)), for all 
q E Qa. 


Intuitively, all states of a nominal RSFA recognize residuals, but 
not all residuals are recognized by a single state: There may be 
a residual £’ and a set of states Q’ such that L’ = U co L(q), 
but no state q’ is such that L(q') = £’. A residual £’ is called 
composed if it is equal to the union of the components it strictly 
contains, explicitly 


L' =| JL" eR) IL" GL}; 


otherwise it is called prime. In an ordinary RSFA, composed 
residuals have finitely-many components. This is not the case in a 
nominal RFSA. However, the set of components of £’ always has a 
finite support, namely supp(L’). 

The set of prime residuals PR(L) is an orbit-finite nominal 
set, and can be used to define a canonical nominal RFSA for £, 
which has the minimal number of states and the maximal number 
of transitions. This can be regarded as obtained from the minimal 
nominal DFA, by removing composed states and adding all initial 
states and transitions that do not change the recognized language. 
This automaton is necessarily unique. 


Lemma 8. Let the canonical nominal RSFA of L be (Q, Qo, F, ô) 
such that: 

e Q = PR(L); 

e Qo={L' EQ] L CLy 

eF={LEQ|c€EL}; 

e ô(Lı,a) = {Lo EQ | L2 C a*Ly}. 
It is a well-defined nominal NFA accepting L. 


5.2 vNL* 


Our nominal version of NL* again makes use of an observation table 
(S, E) where S and E are equivariant subsets of A* and row is 
an equivariant function. As in the basic algorithm, we equip (S, E) 
with a union operation L and row containment relation C, defined as 
in Definition|2| It is immediate to verify that L and E are equivariant. 

Our algorithm is a simple modification of the algorithm in 
Figure[3] where a few lines are replaced: 


6 S+ SUorb(sa) 
9% E+ EUVorb(ae) 
12’ E + EUsuffixes(orb(t)) 


Switching to nominal sets, several decidability issues arise. The 
most critical one is that rows may be the union of infinitely many 
component rows, as happens for residuals of nominal languages, 


so finding all such components can be challenging. We adapt the 
notion of composed to rows: row(t) is composed whenever 


row(t) = |_|{row(s) | row(s) 


where C is strict row inclusion; otherwise row(t) is prime. 
We now check that relevant parts of our algorithm terminate. 


Row Containment Check. The basic containment check 
row(s) E row(t) is decidable, as row(s) and row(t) are 
supported by the finite supports of s and t respectively. 


Line[3} RFSA-Closedness and RFSA-Consistency Checks. We 
first show that prime rows form orbit-finite nominal sets. 


Lemma 9. PR(S, E), PR! (S, E) and PR(S, E) \ PR! 


are orbit-finite nominal sets. 


C row(t)} . 


(S, E) 


Consider now RFSA-closedness. It requires computing the set 
C(row(t)) of components of row(t) contained in PR (S, E) 
(possibly including row(t)). This may not be equivariant under 


permutations Perm(A), but it is if we pick a subgroup. 


Lemma 10. The set C(row(t)) has the following properties: 
1. supp(C(row(t))) C supp(row(t)). 
2. it is equivariant and orbit-finite under the action of the group 


Gt 
of permutations fixing supp(row(t)). 


{r € Perm(A) | mlsupp(row()) = id} 


We established that C'(row(t)) can be effectively computed, and 
the same holds for |_| C(row(t)). In fact, |_| is equivariant w.r.t the 
whole Perm(A) and then, in particular, w.r.t. G+, so it preserves 
orbit-finiteness. Now, to check row(t) = L]C(row(t)), we can 
just pick one representative of every orbit of S-A, because we have 
C(m-row(t)) = 7-C(row(t)) and permutations distribute over L, 
so permuting both sides of the equation gives again a valid equation. 

For RFSA-consistency, consider the two sets: 


N = {(81, 52) € S x S | row(s1) E row(s2)} 
M = {(s1, 82) E S x S| Va E A: row(sıa) E row(s2a)} 


They are both orbit-finite nominal sets, by equivariance of row, E 
and A. We can check RFSA-consistency in finite time by picking 
orbit representatives from N and M. For each representative n € N, 
we look for a representative m € M and a permutation 7 such that 
n = T - m. If no such m and 7 exist, then n does not belong to any 
orbit of M, so it violates RFSA-consistency. 


Lines [5] and |8} Finding Witnesses for Violations. We can find 
witnesses by comparing orbit representatives of orbit-finite sets, as 
we did with RFSA-consistency. Specifically, we can pick represen- 
tatives in S x A and S x S x A x E and check them against the 
following orbit-finite nominal sets: 

e {(s,a) E€ S x A | row(sa) € PR(S, E) \ PR! (S, E)}; 

e {(s1,52,a,e) € SXSXAXE | 

1, row(s2a)(e) = 0, row(s1) E row(s2)}; 


row(si1a)(e) 


5.3 Correctness 


Now we prove correctness and termination of the algorithm. First, 
we prove that hypothesis automata are nominal NFAs. 


Lemma 11. The hypothesis automaton N (S, E) (see Definition|4) 
is a nominal NFA. 


N(S, E), as in ordinary NL”, is not always a nominal RFSA. 
However, we have the following. 


Theorem 3. /f the table (S, E) is RFSA-closed, RFSA-consistent 
and N(S, E) is consistent with (S, E), then N(S, E) is a canonical 
nominal RFSA. 
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This is proved in for ordinary RFSAs, using the standard 
theory of regular languages. The nominal proof is exactly the same, 
using derivatives of nominal regular languages and nominal RFSAs 
as defined in Section[5.1] 


Lemma 12. The table (S, E) cannot have more than n orbits of 
distinct rows, where n is the number of orbits of the minimal nominal 
DFA for the target language. 


Proof. Rows are residuals of £, which are states of the minimal 
nominal DFA for £, so orbits cannot be more than n. 


Theorem 4. The algorithm vNL* terminates and returns the canon- 
ical nominal RFSA for L. 


Proof. If the algorithm terminates, then it must return the canonical 
nominal RFSA for £ by Theorem [3] We prove that a table can 
be made RFSA-closed and RFSA-consistent in finite time. This is 
similar to the proof of Theorem 2 and is inspired by the proof 
Theorem 3]. 

If the table is not RFSA-closed, we find a row s € S-A 
such that row(s) € PR(S,E) \ PR'(S,E). The algorithm 
then adds orb(s) to S. Since s was nonequivalent to all upper 
prime rows, and thus from all the rows indexed by S, we find 
that S U orb(t)/~ has strictly more orbits than S/~ (recall that 
s~ t <— _ rou(s) row(t)). This addition can only be 
done finitely-many times, because the number of orbits of S/~ is 
bounded, by Lemma|[I2] 

Now, the case of RFSA-consistency needs some additional 
notions. Let R be the (orbit-finite) nominal set of all rows, and 
let I = {(r,r’) € Rx R |r C r'} be the set of all inclusion 
relations among rows. The set J is orbit-finite. In fact, consider 


J = {(s,t) € (S U S-A) x (S U S-A) | row(s) C row(t)} 


This set is an equivariant, thus orbit-finite, subset of (S U S-A) x 
(S U S-A). The set I is the image of J via row x row, which is 
equivariant, so it preserves orbit-finiteness. 

Now, suppose the algorithm finds two elements s1, s2 € S with 
row(s1) E row(s2) but row(s1a)(e) = 1 and row(s2a)(e) = 0 
for some a € A and e € E. Adding a column to fix RFSA- 
consistency may: C1) increase orbits of (S U S-A)/~, or; C2) 
decrease orbits of I, or; C3) decrease local symmetries/increase 
dimension of one orbit of rows. In fact, if no new rows are added 
(C1), we have two cases. 

e If row(si1) C row(s2), i.e., (row(s1),row(s2)) € I, then 
row'(si) É row’(s2), where row’ is the new table. There- 
fore the orbit of (row’(s1), row’(s2)) is not in I. Moreover, 
row’ (s) C row’ (t) implies row(s) C row(t) (as no new rows 
are added), so no new pairs are added to J. Overall, J has less 
orbits (C2). 

e If row(si1) = row(s2), then we must have row(si) = 7 - 
row(s1), for some v, because line |5| forbids equal rows in 
different orbits. In this case row’(s1) 4 7 - row’(s1) and we 
can use part of the proof of Theorem|2|to see that the orbit of 
row’ (s1) has bigger dimension or less local symmetries than 
that of row(s1) (C3). 

Orbits of (SU.S-A)/~ and of I are finitely-many, by Lemma|]2Jand 
what we proved above. Moreover, local symmetries can decrease 
finitely-many times, and the dimension of each orbit of rows 
is bounded by the dimension of the minimal DFA state-space. 
Therefore all the above changes can happen finitely-many times. 

We have proved that the table eventually becomes RFSA-closed 
and RFSA-consistent. Now we prove that a finite number of equiv- 
alence queries is needed to reach the final hypothesis automaton. 
To do this, we cannot use a suitable version of Lemma/4] because 
this relies on N (S, E) being consistent with (S, Æ), which in gen- 
eral is not true (see for an example of this). We can, however, 


use an argument similar to that for RFSA-consistency, because the 
algorithm adds columns in response to counterexamples. Let w 
the counterexample provided by the teacher. When 12’ is executed, 
the table must change. In fact, by Lemma 2], if it does not, 
then w is already correctly classified by N (S, E), which is absurd. 
We have the following cases. E1) orbits of (S U S-A)/~ increase 
(C1). Or, E2) either: Orbits in PR(S, E) increase, or any of the 
following happens: Orbits in J decrease (C2), local symmetries/ 
dimension of an orbit of rows change (C3). In fact, if E1 does not 
happen and PR(S, E), I and local symmetries/dimension of orbits 
of rows do not change, the automaton A for the new table coin- 
cides with N(S, E). But N(S, E) = A is a contradiction, because 
A correctly classifies w (by Lemma 2], as w now belongs to 
columns), whereas N (S, E) does not. Both E1 and E2 can only 
happen finitely-many times. 


5.4 Query Complexity 


We now give bounds for the number of equivalence and membership 
queries needed by vNL*. Let n be the number of orbits of the 
minimal DFA M for the target language and let k be the dimension 
(i.e., the size of the maximum support) of its nominal set of states. 


Lemma 13. The number of equivalence queries E, p is 
O(n? falk, k) + nk log k). 


Proof. In the proof of Theorem|4] we saw that equivalence queries 
lead to more orbits in (S U S-A)/~, in PR(S, E), less orbits in I 
or less local symmetries/bigger dimension for an orbit. Clearly the 
first two ones can happen at most n times. We now estimate how 
many times J can decrease. Suppose (S U S-A)/~ has d orbits and 
h orbits are added to it. Recall that, given an orbit O of rows of 
dimension at most m, fa(m, m) is an upper bound for the number 
of orbits in the product O x O. Since the support of rows is bounded 
by k, we can give a bound for the number of orbits added to I: 
dh fa (k, k), for new pairs r C r’ with r in a new orbit of rows and 
r’ in an old one (or viceversa); plus (h(h — 1) /2) fa (k, k), for r and 
r’ both in (distinct) new orbits; plus h fa (k, k), for r and r’ in the 
same new orbit. Notice that, if PR(S, E) grows but (S U S-A)/~ 
does not, J does not increase. By Lemma} 12} h,d < n, so I cannot 
decrease more than (n? + n(n — 1)/2 +n) fa (k, k) times. 

Local symmetries of an orbit of rows can decrease at most k log k 
times (see proof of LemmaJ5), and its dimension can increase at 
most k times. Therefore n(k + log k) is a bound for all the orbits 
of rows, which are at most n, by Lemma{I2| Summing up, we get 
the main result. 


Lemma 14. Let m be the length of the longest counterexample 
given by the teacher. Then the table has: 

e at most n orbits in S, with words of length at most n; 

e at most mE, x orbits in E, with words of length at most mE}, p- 


Proof. By Lemma[12] the number of orbits of rows indexed by S 
is at most n. Now, notice that line|5]does not add orb(sa) to S if 
sa € S, and lines[12]and|9|cannot identify rows, so S has at most 
n orbits. The length of the longest word in S must be at most n, as 
S = {e} when the algorithm starts, and line 6’ adds words with one 
additional symbol than those in S. 

For columns, we note that both fixing RFSA-consistency and 
adding counterexamples increase the number of columns, but this 
can happen at most £7, ;, times (see proof of Lemma{13}. Each time 
at most m suffixes are added to FE. 


We compute the maximum number of cells as in Section|4.3] 


Lemma 15. The number of orbits in the lower part of the table, 
S-A, is bounded by nl fa (pn, p). 
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Then Chem = n(Ifa(pn,p) + 1)mE;,, is the maximal 
number of cells in the table. This bound is polynomial in n,m 
and l, but not in k and p. 


Corollary 2. The number of membership queries is bounded by 
Ch,k,mfa(pn, pmEy, x): 


6. 


Our algorithms for learning nominal automata operate on infinite 
sets of rows and columns, and hence it is not immediately clear 
how to actually implement them on a computer. We have used 
NLambda [26], our recently developed Haskell library designed 
to allow direct manipulation of infinite (but orbit-finite) nominal 
sets, within the functional programming paradigm. The semantics of 
NLambda is based on [B], and the library itself is inspired by Fresh 
O’Caml , a language for functional programming over nominal 
data structures with binding. 


Implementation and Preliminary Experiments 


6.1 NLambda 


NLambda extends Haskell with a new type Atoms. Values of this 
type are atomic values that can be compared for equality and have 
no other discernible structure. They correspond to the elements of 
the infinite alphabet A described in Section} 

Furthermore, NLambda provides a unary type constructor Set. 
This appears similar to the the Data.Set type constructor from 
the standard Haskell library, but its semantics is markedly different: 
Whereas the latter is used to construct finite sets, the former has 
orbit-finite sets as values. The new constructor Set can be applied to 
a range of equality types that include Atoms, but also the tuple type 
(Atoms, Atoms), the list type [Atoms], the set type Set Atoms, and 
other types that provide basic infrastructure necessary to speak 
of supports and orbits. All these are instances of a type class 
NominalType specified in NLambda for this purpose. 

NLambda, in addition to all the standard machinery of Haskell, 
offers primitives to manipulate values of any nominal types 7, o: 

© empty:Set 7, returns the empty set of any type; 


è atoms:Set Atoms, returns the (infinite but single-orbit) set of 
all atoms; 


insert :7 — Set T — Set 7, adds an element to a set; 


map : (T > a) — (Set T > Set o), applies a function to ev- 
ery element of a set; 


sum : Set Set T — Set 7, computes the union of a family of 
sets; 


e isEmpty : Set T — Formula, checks whether a set is empty. 
The type Formula takes the role of a Boolean type. For technical 
reasons it is distinct from the standard Haskell type Bool, but it 
provides standard logical operations such as 


not : Formula — Formula 
or : Formula — Formula — Formula, 


as well as a conditional operator ite : Formula > T > T > T 
that mimics the standard if construction. It is also the result type of a 
built-in equality test on atoms, eq : Atoms —> Atoms — Formula. 
Using these primitives, one builds more functions to operate on 
orbit-finite sets, such as a function to build singleton sets: 


singleton: T — Set T 

singleton x = insert x empty 
or a filtering function to select elements that satisfy a given predicate: 
filter : (T — Formula) —> Set T > Set T 
filter p s = sum (map(Ax.ite (p x) (singleton x) empty) s) 


or functions to quantify a predicate over a set: 
exists, forall : (7 — Formula) —> Set 7 — Formula 
exists ps = not (isEmpty (filter ps)) 
forall ps = isEmpty (filter (Ax.not (p x)) s) 


and so on. Note that these functions are written in exactly the same 
way as they would be for finite sets and the standard Data.Set 
type. This is not an accident, and indeed the programmer can use 
the convenient set-theoretic intuition of NLambda primitives. For 
example, one could conveniently construct various orbit-finite sets 
such as the set of all pairs of atoms: 


atomPairs = sum (map (Ax.map (Ay.(x, y)) atoms) atoms), 
the set of all pairs of distinct atoms: 
distPairs = filter (A(x, y).not(eq x y)) atomPairs 


and so on. 

It should be stressed that all these constructions terminate in finite 
time, even though they formally involve infinite sets. To achieve this, 
values of orbit-finite set types Set 7 are internally not represented 
as lists or trees of elements of type 7. Instead, they are stored and 
manipulated symbolically, using first-order formulas over variables 
that range over atom values. For example, the value of distPairs 
above is stored as the formal expression: 


{(a,b) | a,b E A, a £ b} 


or, more specifically, as a triple: 
e a pair (a, b) of “atom variables”, 


e a list [a, b] of those atom variables that are bound in the expres- 
sion (in this case, the expression contains no free variables), 


e a formula a Æ b over atom variables. 

All the primitives listed above, such as isEmpty, map and sum, 
are implemented on this internal representation. In some cases, 
this involves checking the satisfiability of certain formulas over 
atoms. In the current implementation of NLambda, an external SMT 
solver Z3 is used for that purpose. For example, to evaluate the 
expression isEmpty distPairs, NLambda makes a system call to 
the SMT solver to check whether the formula a +Æ b is satisfiable in 
the first-order theory of equality and, after receiving the affirmative 
answer, returns the value False. 

For more details about the semantics and implementation of 
NLambda, see . The library itself can be downloaded from : 


6.2 Implementation of vL* and vNL* 


Using NLambda we implemented the algorithms from Sec- 
tions [4]and [5] We note that the internal representation is slightly 
different than the one discussed in Section[4] Instead of representing 
the table (S, E) with actual representatives of orbits, the sets are 
represented logically as described above. Furthermore the control 
flow of the algorithm is adapted to fit in the functional programming 
paradigm. In particular, recursion is used instead of a while loop. In 
addition to the nominal adaptation of Angluin’s algorithm vL*, we 
implemented a variant, vL*,, which adds counterexamples to the 
columns instead of rows. 

Target automata are defined using NLambda as well, using the 
automaton data type provided by the library. Membership queries 
are already implemented by the library. Equivalence queries are 
implemented by constructing a bisimulation (recall that bisimulation 
implies language equivalence), where a counterexample is obtained 
when two DFAs are not bisimilar. For nominal NFAs, however, we 
cannot implement a complete equivalence query as their language 
equivalence is undecidable. We approximated the equivalence by 
bounding the depth of the bisimulation for nominal NFAs. As an 
optimization, we use bisimulation up to congruence . Having 
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DFA vL* (s) Virol (s) RFSA vNL* (s) 

FIFOo 2 0 1.9 1.9 2 0 2.4 
FIFO, 3 1 12.9 7.4 3 1 17.3 
FIFO: 5 2 45.6 22.6 5 2 70.3 
FIFO3 10 3 189 107 10 3 476 
FIFOs, | 25 4 370 267 25 4 1230 
FIFOs | 77 5 1337 697 œo œ oo 
Lo 2 0 1.3 1.4 2 0 1.4 
Lı 4 1 29.6 4.7 4 1 8.9 
Lo 7 2 229 23.1 7 2 84.7 
Lo 3 1 4.4 4.9 zo 11.3 
Li 5 1 15.4 15.4 4 1 66.4 

A 9 1 46.3 40.5 5 1 210 

A 17 1 89.0 66.8 6 1 566 
Leq n/a n/a n/a n/a 3 1 16.3 


Table 1. Results of experiments. The column DFA (resp. RFSA) 
shows the number of orbits (left sub-column) and dimension (right 
sub-column) of the learnt minimal DFA (resp. canonical RFSA). We 
use co when the running time is too high. 


an approximate teacher is a minor issue since in many applications 
no complete teacher can be implemented and one relies on testing 
[2] [12]. For the experiments listed here the bound was chosen large 
enough for the learner to terminate with the correct automaton. 

We remark that our algorithms seamlessly merge with teachers 
written in NLambda, but the current version of the library does not 
allow generating concrete membership queries for external teachers. 
We are currently working on a new version of the library in which 
this will be possible. 


6.3 Test Cases 


To provide a benchmark for future improvements, we tested our 
algorithms on a few simple automata described below. We report 
results in Table[1] The experiments were performed on a machine 
with an Intel Core iS (Skylake, 2.4 GHz) and 8 GB RAM. 


Queue Data Structure. A queue is a data structure to store el- 
ements which can later be retrieved in a first-in, first-out order. 
It has two operations: push and pop. We define the alphabet 
“rrro = {push(a), pop(a) | a € A}. The language FIFO, 
contains all valid traces of push and pop using a bounded queue of 
size n. The minimal nominal DFA for FT FO% is 


POF )/push(A) 
The state reached from qi,2 via pusht), is omitted: Its outgoing 


transitions are those of q2,z,y, where y is replaced by x. Similar 
benchmarks appear in [2] 23]. 

Double Word. Ln = {ww | w E€ A” } from Section[2] 

NFA. Consider the language Leg = Ua ca A*aA*aA* of words 
where some letter appears twice. This is accepted by an NFA which 
guesses the position of the first occurrence of a repeated letter a and 
then waits for the second a to appear. The language is not accepted 
by a DFA [9]. Despite this vNL* is able to learn the automaton: 


where the transition from q to qj,» is defined as (qo, a) = {q1 | 
b € A}. 


n-last Position. A prototypical example of regular languages which 
are accepted by very small NFAs is the set of words where a 
distinguished symbol a appears on the n-last position [11]. We 
define a similar nominal language Li, = Uaca aA*aA”. To 
accept such words non-deterministically, one simply guesses the 
n-last position. This language is also accepted by a much larger 
deterministic automaton. 


7. Related Work 


This section compares vL* with other algorithms from the literature. 
We stress that no comparison is possible for vNL*, as it is the 
first learning algorithm for non-deterministic automata over infinite 
alphabets. 

The first one to consider learning automata over infinite alphabets 
was Sakamoto [38]. In his work the problem is reduced to L* with 
some finite sub-alphabet. The sub-alphabet grows in stages and L* 
is rerun at every stage, until the alphabet is big enough to capture the 
whole language. In Sakamoto’s approach, any learning algorithm 
can be used as a back-end. This, however, comes at a cost: It has to 
be rerun at every stage, and each symbol is treated in isolation, 
which might require more queries. Our algorithm vL*, instead, 
works with the whole alphabet from the very start, and it exploits 
its symmetry. An example is in Sections[2.T]and[2.2| The ordinary 
learner uses four equivalence queries, whereas the nominal one, 
using the symmetry, only needs three. Moreover, our algorithm is 
easier to generalize to other alphabets and computational models, 
such as non-determinism. 

More recently papers appeared on learning register automata 
. Their register automata are as expressive as our deterministic 
nominal automata. The state-space is similar to our orbit-wise 
representation: It is formed by finitely many locations with registers. 
Transitions are defined symbolically using propositional logic. We 
remark that the most recent paper [15] generalizes the algorithm to 
alphabets with different structures (which correspond to different 
atom symmetries in our work), but at the cost of changing Angluin’s 
framework. Instead of membership queries the algorithm requires 
more sophisticated tree queries. In our approach, using a different 
symmetry does not affect neither the algorithm nor its correctness 
proof. Tree queries can be reduced to membership queries by 
enumerating all n-types for some n (n-types in logic correspond to 
orbits in the set of n-tuples). Keeping that in mind, their complexity 
results are roughly the same as ours, although this is hard to verify, 
as they do not give bounds on the length of individual tree queries. 
Finally, our approach lends itself better to be extended to other 
variations on L* (of which many exist), as it is closer to Angluin’s 
original work. 

Another class of learning algorithms for systems with large al- 
phabets is based on abstraction and refinement, which is orthogonal 
to the approach in the present paper but connections and possible 
transference of techniques are worth exploring in the future. In [2], 
the alphabet is reduced to a finite alphabet of abstractions, and L* 
for ordinary DFAs over such finite alphabet is used. Abstractions are 
refined by counterexamples. Other similar approaches are [20)[22], 
where global and local per-state abstractions of the alphabet are used, 
and [30)|32], where the alphabet can also have additional structure 
(e.g., an ordering relation). We can also mention [14], a framework 
for learning symbolic models of software behavior. 

In [5}|6], authors cope with an infinite alphabet by running L* 
(adapted to Mealy machines) using a finite approximation of the 
alphabet, which may be augmented when equivalence queries are 
answered. A smaller symbolic model is derived subsequently. Their 
approach, unlike ours, does not exploit the symmetry over the full 
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alphabet. The symmetry allows our algorithm to reduce queries and 
to produce the smallest possible automaton at every step. 

Finally we compare with results on session automata ; 
Session automata are defined over finite alphabets just like the work 
by Sakamoto. However, session automata are more restrictive than 
deterministic nominal automata. For example, the model cannot 
capture an acceptor for the language of words where consecutive 
data values are distinct. This language can be accepted by a three 
orbit nominal DFA, which can be learned by our algorithm. 

We implemented our algorithms in the nominal library NLambda 
as sketched before. Other implementation options include Fresh 
O’Caml [39], a functional programming language designed for 
programming over nominal data structures with binding, and 
LOIS (27](28], a C++ library for imperative nominal programming. 
We chose NLambda for its convenient set-theoretic primitives, but 
the other options remain to be explored, in particular the low-level 
LOIS could be expected to provide more efficient implementations. 


8. Discussion and Future Work 


In this paper we defined and implemented extensions of several 

versions of L* and of NL* for nominal automata. 
We highlight two features of our approach: 

eè it has strong theoretical foundations: The theory of nominal 

languages, covering different alphabets and symmetries (see 
Section BI}; category theory, where nominal automata have 
been characterized as coalgebras and many properties 
and algorithms (e.g., minimization) have been studied at this 
abstract level. 


e it follows a generic pattern for transporting computation models 
and algorithms from finite sets to nominal sets, which leads to 
simple correctness proofs. 
These features pave the way to several extensions and improvements. 

Future work includes a general version of vNL*, parametric in 
the notion of side-effect (an example is non-determinism). Different 
notions will yield models with different degree of succinctness w.r.t. 
deterministic automata. The key observation here is that many forms 
of non-determinism and other side effects can be captured via the 
categorical notion of monad, i.e., an algebraic structure, on the 
state-space. Monads allow generalizing the notion of composed and 
prime state: A state is composed whenever it is obtained from other 
states via an algebraic operation. Our algorithm vNL* is based on 
the powerset monad, representing classical non-determinism. We 
are currently investigating a substitution monad, where the operation 
is “applying a (possibly non-injective) substitution of atoms in the 
support”. A minimal automaton over this monad, akin to a RFSA, 
will have states that can generate all the states of the associated 
minimal DFA via a substitution, but cannot be generated by other 
states (they are prime). For instance, we can give an automaton over 
the substitution monad that recognizes £2 from Section[2} 


Here [y ++ a] means that, if that transition is taken, qzy (hence 
its language) is subject to y +> a. In general, the size of the 
minimal DFA for Ln grows more than exponentially with n, but an 
automaton with substitutions on transitions, like the one above, only 
needs O(n) states. 

In principle, thanks to the generic approach we have taken, all 
our algorithms should work for various kinds of atoms with more 
structure than just equality, as advocated in [9]. Details, such as pre- 
cise assumptions on the underlying structure of atoms necessary for 
proofs to go through, remain to be checked. For an implementation 


of automata learning over other kinds of atoms without compromis- 
ing the generic approach, an extension of NLambda to those atoms 
will be needed, as the current version of the library only supports 
equality and totally ordered atoms. 

The efficiency of our current implementation, as measured in 
Section|6.3] leaves much to be desired. There is plenty of potential 
for running time optimization, ranging from improvements in the 
learning algorithms itself, to optimizations in the NLambda library 
(such as replacing the external and general-purpose SMT solver 
with a purpose-built, internal one, or a tighter integration of nominal 
mechanisms with the underlying Haskell language as it was done 
in [B9]), to giving up the functional programming paradigm for an 
imperative language such as LOIS [27]|28}. 
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